Merged
Conversation
Update license field from deprecated table format to simple string format to comply with modern setuptools standards and eliminate deprecation warnings.
Changed from SPDX string format back to {text = "MIT"} format for
compatibility with current PyPI infrastructure which does not yet
support the License-Expression metadata field.
igerber
pushed a commit
that referenced
this pull request
Jan 4, 2026
Review fixes: - Add edge case validation in _compute_flci (se > 0, 0 < alpha < 1) - Improve significance_stars docstring explaining partial identification - Standardize error messages to include parameter values (M, Mbar, alpha) - Make LP solver method configurable in _solve_bounds_lp - Add clarifying comment about constraint matrix design for pre+post periods - Improve CallawaySantAnna error message with actionable guidance Notes: - #4 (sensitivity_plot export) was verified as valid - function exists at honest_did.py:1437 - #1 (pre-period effects) verified correct - LP optimization covers all periods but only post-periods contribute to objective function
igerber
pushed a commit
that referenced
this pull request
Jan 4, 2026
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Phase 2 silent-failures audit — axis-G (backend parity). Closes the coverage gap the audit flagged in three Rust-backed solver surfaces. Test-only PR; any discovered divergences are marked `xfail(strict=True)` and logged to `TODO.md` as P1 follow-ups rather than fixed in-scope. Finding #21 — `solve_ols` skip-rank-check parity (`linalg.py:369-373, 597-639`): three parity tests in `TestSolveOLSSkipRankCheckParity` covering mixed-scale columns (norm ratio > 1e6), near-singular full-rank (cond > 1e10), and rank-deficient collinear designs under `skip_rank_check=True` on HC1. Backends agree on fitted values within `rtol=1e-6, atol=1e-8`. All pass; no Rust-side code change needed. Finding #22 — `compute_synthetic_weights` parity (`utils.py:1134-1199`): three parity tests in `TestSyntheticWeightsBackendParity`. Near-singular `Y'Y` passes at `atol=1e-7`; extreme Y scale (1e9) and lambda_reg variations are `xfail(strict=True)` with a baselined ~15-80% weight divergence. Root cause: Rust path is Frank-Wolfe, Python fallback is projected gradient descent (`utils.py:1228`) — same QP, different simplex vertices under near-degenerate inputs. Finding #23 — TROP Rust grid-search + bootstrap parity (`trop_global.py:688-750, 966-1006`): two parity tests in `TestTROPRustEdgeCaseParity`, `@pytest.mark.slow` class-level. Both `xfail(strict=True)`: grid-search ATT on rank-deficient Y (~6% divergence), bootstrap SE under `seed=42` (~28% divergence, RNG backend mismatch — Rust `rand` crate vs numpy `default_rng`). Plan governance: - Per `feedback_ci_reviewer_pattern_checks`, greped adjacent Rust entry points (`_solve_ols_rust`, `_rust_synthetic_weights`, `_rust_loocv_grid_search_global`, `_rust_bootstrap_trop_variance_global`); no additional silent-fallback surfaces identified. - Per plan Non-goal #4, did not open an axis-H finding on TROP's `seed=None → 0` substitution at `trop_global.py:994` (out of scope). - No behavioral changes, no warnings, no REGISTRY changes, no flags. TODO.md logs three P1 follow-up entries: algorithmic unification for `compute_synthetic_weights` (FW vs PGD), TROP grid-search divergence on rank-deficient Y, TROP bootstrap RNG unification. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Closes BR/DR foundation gap #4 (real-dataset validation) from the external-positioning gap list in ``project_br_dr_foundation.md``. Validation artifact: - ``docs/validation/validate_br_dr_canonical.py`` runs BusinessReport / DiagnosticReport on Card-Krueger (1994), mpdta (Callaway-Sant'Anna 2021 benchmark), and Castle Doctrine (Cheng-Hoekstra 2013 under both CS and SA), dumping summary + full_report + selected to_dict blocks for each. - ``docs/validation/br_dr_canonical_validation.md`` is the regenerable raw output. - ``docs/validation/br_dr_canonical_findings.md`` is the hand-written synthesis: direction / verdict / sensitivity tier all match canonical interpretations, with two small wording bugs surfaced and fixed in this PR and two larger gaps queued as follow-up (SA HonestDiD applicability, target-parameter disambiguation). Wording fixes: 1. Treatment-label capitalization. ``str.capitalize()`` lowercased every character after the first, flattening embedded abbreviations (``"the NJ minimum-wage increase"`` → ``"The nj minimum-wage increase"``) and proper-noun phrases (``"Castle Doctrine law adoption"`` → ``"Castle doctrine law adoption"``). Replaced with a ``_sentence_first_upper`` helper that preserves user-supplied casing. 2. ``breakdown_M == 0`` phrasing. The HonestDiD fragile sentence quoted ``{breakdown_M:.2g}x the pre-period variation``, which renders as a degenerate ``0x`` on the exact-zero case surfaced by Cheng-Hoekstra. At ``breakdown_M <= 0.05`` (covers 0 and near-zero values), both BR's summary and DR's overall_interpretation now say "includes zero even at the smallest parallel-trends violations on the sensitivity grid" instead. Tests: 5 new regressions in ``TestCanonicalValidationSurfaceFixes`` covering both fixes + three boundary cases (exact zero, small positive, normal fragile value). Not in scope: Favara-Imbs (dCDH reversible-treatment dataset not bundled), ImputationDiD / TwoStageDiD on canonical data (needed to exercise the R42 untreated-outcome FE assumption branch on real data), SA HonestDiD applicability gap. All tracked in the findings doc for follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 20, 2026
Close BR/DR gap #4: canonical-dataset regression guards + wording fixes
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
…nuousDiD prerequisite list as profile-side screening + add first_treat caveat
P1 (the five profile-derived facts are not the "full" gate set):
Reviewer correctly noted that calling
`{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the
"full ContinuousDiD pre-fit gate set" overreaches. `profile_panel`
only sees the four columns it accepts and CANNOT see the separate
`first_treat` column that `ContinuousDiD.fit()` consumes. Verified
against `continuous_did.py:230-360`: `fit()` additionally rejects
NaN/inf/negative `first_treat`, drops units with `first_treat > 0`
AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose
`dose != 0` with a `UserWarning`. A panel that passes all five
profile-side checks can still surface warnings, drop rows, or raise
at fit time depending on the `first_treat` column the caller
supplies.
Reframed the wording in five surfaces from "full gate set" to
"profile-side screening checks" with an explicit caveat that the
checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()`
applies separate `first_treat` validation:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
out the screening framing explicitly + lists the `first_treat`
validations that fit() applies).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
(aligned with public contract: most fields descriptive,
`dose_min > 0` is one of the screening checks).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote
the multi-paragraph block to describe screening + first_treat
caveat).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
feature paragraph: screening checks + necessary-not-sufficient
language + pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
reasoning chain (rewrote step 2 to call out screening +
first_treat caveat; clarified counter-example #4 that
`P(D=0) > 0` is required under BOTH `control_group="never_treated"`
and `"not_yet_treated"`, not just default).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track.
P2 (test coverage for the missing `first_treat` caveat):
Added a content-stability assertion in `tests/test_guides.py`:
`assert "first_treat" in text` so the autonomous guide cannot
silently drop the explicit `first_treat` validation caveat.
P3 (helper / test-name inconsistency with public contract):
Renamed `test_treatment_dose_does_not_gate_continuous_did` to
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and rewrote its docstring to match the now-honest public contract
("most fields descriptive distributional context that supplements
the existing top-level screening checks"). The test body still
asserts the same two things — `treatment_varies_within_unit` fires
True on `0,0,d,d` paths and `has_never_treated` is independent of
`has_zero_dose` — both of which remain accurate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
…nuousDiD prerequisite list as profile-side screening + add first_treat caveat
P1 (the five profile-derived facts are not the "full" gate set):
Reviewer correctly noted that calling
`{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}` the
"full ContinuousDiD pre-fit gate set" overreaches. `profile_panel`
only sees the four columns it accepts and CANNOT see the separate
`first_treat` column that `ContinuousDiD.fit()` consumes. Verified
against `continuous_did.py:230-360`: `fit()` additionally rejects
NaN/inf/negative `first_treat`, drops units with `first_treat > 0`
AND `dose == 0`, and force-zeroes `first_treat == 0` rows whose
`dose != 0` with a `UserWarning`. A panel that passes all five
profile-side checks can still surface warnings, drop rows, or raise
at fit time depending on the `first_treat` column the caller
supplies.
Reframed the wording in five surfaces from "full gate set" to
"profile-side screening checks" with an explicit caveat that the
checks are necessary-but-not-sufficient and that `ContinuousDiD.fit()`
applies separate `first_treat` validation:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells
out the screening framing explicitly + lists the `first_treat`
validations that fit() applies).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
(aligned with public contract: most fields descriptive,
`dose_min > 0` is one of the screening checks).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (rewrote
the multi-paragraph block to describe screening + first_treat
caveat).
- `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design
feature paragraph: screening checks + necessary-not-sufficient
language + pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 worked example
reasoning chain (rewrote step 2 to call out screening +
first_treat caveat; clarified counter-example #4 that
`P(D=0) > 0` is required under BOTH `control_group="never_treated"`
and `"not_yet_treated"`, not just default).
- `CHANGELOG.md` Unreleased entry.
- `ROADMAP.md` AI-Agent Track.
P2 (test coverage for the missing `first_treat` caveat):
Added a content-stability assertion in `tests/test_guides.py`:
`assert "first_treat" in text` so the autonomous guide cannot
silently drop the explicit `first_treat` validation caveat.
P3 (helper / test-name inconsistency with public contract):
Renamed `test_treatment_dose_does_not_gate_continuous_did` to
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and rewrote its docstring to match the now-honest public contract
("most fields descriptive distributional context that supplements
the existing top-level screening checks"). The test body still
asserts the same two things — `treatment_varies_within_unit` fires
True on `0,0,d,d` paths and `has_never_treated` is independent of
`has_zero_dose` — both of which remain accurate.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
…light checks as standard-workflow predictions, not estimator gates
Reviewer correctly noted that calling
{has_never_treated, treatment_varies_within_unit==False,
is_balanced, no duplicate_unit_time_rows alert, dose_min > 0}
the "screening checks" / "necessary" gates of `ContinuousDiD`
overstates the contract. `ContinuousDiD.fit()` keys off the
separate `first_treat` column (which `profile_panel` does not see),
defines never-treated controls as `first_treat == 0` rows,
force-zeroes nonzero `dose` on those rows with a `UserWarning`,
and rejects negative dose only among treated units `first_treat > 0`
(see `continuous_did.py:276-327` and `:348-360`).
Two of the five checks (`has_never_treated`, `dose_min > 0`) are
first_treat-dependent: agents who relabel positive- or negative-dose
units as `first_treat == 0` trigger the force-zero coercion path
with a `UserWarning` and may still fit panels that fail those
preflights, with the methodology shifting. The other three
(`treatment_varies_within_unit`, `is_balanced`, duplicate-row
absence) are real fit-time gates that hold regardless of how
`first_treat` is constructed.
Reframed every wording site to call these "standard-workflow
preflight checks" — predictive when the agent derives `first_treat`
from the same dose column passed to `profile_panel`, but not the
estimator's literal contract:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote
the multi-paragraph block; explicit standard-workflow definition
+ per-check first_treat dependency map + force-zero coercion
caveat).
- `diff_diff/profile.py` `_compute_treatment_dose` helper docstring
(already brief; stays consistent).
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (long
rewrite covering the standard-workflow framing + override paths).
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet +
trailing paragraph (both updated; opening bullet now spells out
which of the five checks are first_treat-dependent vs. hard
fit-time stops; trailing paragraph promotes the standard-
workflow framing).
- `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step
2 (rewrote the gate-checking paragraph; counter-example #4
expanded to enumerate (a) supply matching first_treat and accept
rejection, (b) deliberate relabel + coercion, (c) different
estimator; counter-example #5 distinguishes negative-dose
treated-unit rejection from never-treated coercion).
- `CHANGELOG.md` Wave 2 entry (matches the new framing).
- `ROADMAP.md` AI-Agent Track building block (matches).
Test coverage:
- Renamed assertion messages in
`test_treatment_dose_descriptive_fields_supplement_existing_gates`
and `test_treatment_dose_min_flags_negative_dose_continuous_panels`
to remove "authoritative gate" phrasing; reframed as "standard-
workflow preflight" assertions consistent with the corrected docs.
- Added `test_negative_dose_on_never_treated_coerces_not_rejects`
in `tests/test_continuous_did.py::TestEdgeCases` covering the
reviewer's specific request: never-treated rows with NEGATIVE
nonzero dose must coerce (with `UserWarning`) rather than raise
the treated-unit negative-dose error. Sister to the existing
`test_nonzero_dose_on_never_treated_warns` which covers the
positive-dose case.
Rebased onto origin/main during this round (no conflicts beyond
prior CHANGELOG resolutions; main advanced 19 commits).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
…s-fallback wording; correct duplicate-row "fit-time stop" claim
P1 (relabel-to-manufacture-controls misframing):
Round 11 introduced wording across the guide, profile docstring,
CHANGELOG, ROADMAP, and test docstrings that presented intentional
`first_treat == 0` relabeling of nonzero-dose units as an
"option" / "fallback" for fitting `ContinuousDiD` when the
profile-side preflights (`has_never_treated`, `dose_min > 0`)
fail. REGISTRY does not document this as a routing option, and the
estimator still requires actual `P(D=0) > 0` because Remark 3.1
lowest-dose-as-control is not yet implemented. The force-zero
coercion at `continuous_did.py:311-327` is implementation behavior
for INCONSISTENT inputs (e.g., user accidentally passes nonzero
dose on a never-treated row), not a methodology fallback.
Reworded every site to remove the relabeling-as-option framing and
replace it with the registry-documented fixes when (1) or (5)
fails: re-encode the treatment column to a non-negative scale that
contains a true never-treated group, or route to a different
estimator (`HeterogeneousAdoptionDiD` for graded-adoption panels;
linear DiD with the treatment as a continuous covariate). Every
remaining "manufacture controls" mention in the guide, profile,
and tests is now an explicit anti-recommendation ("do not relabel
... to manufacture controls"). Updated:
- `diff_diff/profile.py` `TreatmentDoseShape` docstring (item (1):
"not an opportunity to relabel ..."; item (5): coercion is
"implementation behavior for inconsistent inputs, not a
methodological fallback").
- `diff_diff/guides/llms-autonomous.txt` §2 field reference (the
When-(1)-or-(5)-fails paragraph names re-encode + alternative
estimator only; explicit anti-relabel warning).
- `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet +
trailing paragraph (consolidated; opening bullet drops the
relabel-as-fallback framing; trailing paragraph trimmed to a
pointer to §2).
- `diff_diff/guides/llms-autonomous.txt` §5.2 step 2 + counter-
example #4 + counter-example #5 (relabel-as-option language
removed; explicit "do not relabel" callouts; counter-example #4
options trimmed to (a) re-encode and (b) different estimator).
- `CHANGELOG.md` (relabel-as-option clause removed; replaced with
re-encode / different-estimator framing).
- `ROADMAP.md` (same).
- `tests/test_profile_panel.py` two test docstrings (relabel-as-
workflow language removed).
P2 (duplicate-row "hard fit-time stop" misclaim):
Round 11 wording said "duplicate-row failures are hard fit-time
stops" — incorrect. `_precompute_structures` at
`continuous_did.py:818-823` silently overwrites with last-row-wins,
no exception raised. Reworded as "hard preflight veto: the agent
must deduplicate before fit because `ContinuousDiD` otherwise uses
last-row-wins, no fit-time exception" in profile.py docstring,
guide §4.7 opening bullet, and §5.2 step 2 (now defers to §2 for
the breakdown). The previously-correct §2 description of the
silent-coerce path is preserved.
Length housekeeping:
The round-11 round-12 expansion pushed `llms-autonomous.txt`
above `llms-full.txt`, breaking `test_full_is_largest`. Trimmed
~2.7KB by consolidating the §4.7 trailing paragraph + §5.2 step 2
trailing block to point at §2's full breakdown rather than
duplicating the per-check semantics. autonomous: 65364 chars,
full: 66058 chars.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
… first_treat from dose" framing; add PanelProfile backward-compat defaults; fix test_continuous_did docstring P1 (canonical ContinuousDiD setup vs. derive-from-dose framing): Round 12 introduced a "standard workflow" description across the guide, profile docstring, CHANGELOG, ROADMAP, and test docstrings that said agents derive `first_treat` from the same dose column passed to `profile_panel`. Reviewer correctly noted this conflicts with the actual ContinuousDiD contract (`continuous_did.py:222-228`, `prep_dgp.py:970-993`, `docs/methodology/continuous-did.md:65-73`): the canonical setup uses a **time-invariant per-unit dose** `D_i` and a **separate `first_treat` column** the caller supplies — the dose column has no within-unit time variation in this setup, so it cannot encode timing. An agent following the rejected framing would either build a `0,0,d,d` path (which `fit()` rejects) or keep a valid constant-dose panel (in which case the dose column carries no timing information). Reworded every site to drop the derive-from-dose framing and replace with the canonical setup. The five facts on the dose column remain predictive of `fit()` outcomes BECAUSE the canonical convention ties `first_treat == 0` to `D_i == 0` and treated units carry their constant dose across all periods — so `has_never_treated` proxies `P(D=0) > 0` and `dose_min > 0` predicts the strictly- positive-treated-dose requirement, without any "derivation" of `first_treat` from the dose column. Updated: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (rewrote the multi-paragraph block to use the canonical-setup framing and added an explicit "agent must validate `first_treat` independently" note). - `diff_diff/guides/llms-autonomous.txt` §2 field reference. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet. - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain step 2 + counter-examples #4 and #5 (now describe the canonical setup rather than a derive-from-dose workflow). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` `test_treatment_dose_min_flags _negative_dose_continuous_panels` docstring/comments. P2 (PanelProfile direct-construction backward compat): Wave 2 added `outcome_shape` and `treatment_dose` to PanelProfile without defaults, breaking direct `PanelProfile(...)` calls that predate Wave 2. Made both fields default to `None` (moved them to the end of the field list; both are `Optional[...]`). Added `test_panel_profile_direct_construction_without_wave2_fields` asserting that direct construction without the new fields succeeds and yields `None` defaults that serialize correctly through `to_dict()`. P3 (test_continuous_did.py docstring overstating sanction): The new `test_negative_dose_on_never_treated_coerces_not_rejects` docstring said the contract "lets agents legally relabel negative-dose units as `first_treat == 0` to coerce them away." Reworded as observed implementation behavior for inconsistent inputs, NOT a sanctioned routing option — the test locks in the coercion contract while the autonomous guide §5.2 explicitly tells agents not to use this path methodologically. Length invariant maintained: autonomous (65748 chars) < full (66031 chars); `test_full_is_largest` still passes (compares character count, not byte count, so on-disk size with UTF-8 multi-byte characters differs from the assertion target). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
…fixes" overclaim for ContinuousDiD recoding P1 (overclaiming registry endorsement of recoding): Reviewer correctly noted the round-13/14 wording across the public-facing surfaces called re-encoding the treatment column a "registry-documented fix" / "documented option" / "documented fallback". REGISTRY only documents the `P(D=0) > 0` requirement and explicitly notes Remark 3.1's lowest-dose-as-control fallback is NOT implemented in this library. Re-encoding is an agent-side preprocessing choice that the registry neither endorses nor forbids — calling it "registry-documented" was an over-claim. Reworded twelve sites to drop the "documented" framing: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (items (1) and (5)). - `diff_diff/guides/llms-autonomous.txt` §2 field reference When-(1)-or-(5)-fails paragraph. - `diff_diff/guides/llms-autonomous.txt` §4.7 opening bullet trailing language. - `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph (consolidated to a pointer at §2; reduced redundancy). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain counter-example #4. - `tests/test_profile_panel.py` two test docstrings + one inline assertion message + one trailing comment. - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. The corrected framing across all surfaces: - Honestly state the contract: `ContinuousDiD` requires `P(D=0) > 0` and positive treated doses; Remark 3.1 not implemented. - When the contract isn't met, say `ContinuousDiD` "as currently implemented does not apply" — not "do this fix." - Mention routing alternatives that ARE in the library and DON'T require `P(D=0) > 0`: `HeterogeneousAdoptionDiD`, linear DiD with a continuous covariate. Those are routing facts, not methodology endorsements. - Re-encoding stays in the prose as an "agent-side preprocessing choice that changes the estimand and is not documented in REGISTRY as a supported fallback" — explicitly NOT endorsed. Length housekeeping: trimmed redundancy in the §4.7 trailing paragraph (consolidated to a pointer at §2) and tightened the §2 recoding paragraph; autonomous (65984 chars) < full (66031), `test_full_is_largest` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 25, 2026
…s "negative dose" branches; HAD only valid on the former Reviewer correctly noted that the round-15/16 wording listed `HeterogeneousAdoptionDiD` as a routing alternative whenever `ContinuousDiD` fails on the dose-related preflights, but HAD itself requires non-negative dose support and raises on negative post-period dose at `had.py:1450-1459` (paper Section 2). On a panel with `dose_min < 0`, routing to HAD silently steers an agent into the same fit-time error. Verified the rejection at `had.py:1450-1459`. Reworded every site to split the two failure modes: - Branch (a): `has_never_treated == False` (no zero-dose controls but all observed doses non-negative). `ContinuousDiD` does not apply (Remark 3.1 not implemented). HAD IS a routing alternative on this branch (HAD's contract requires non-negative dose, satisfied here); linear DiD with a continuous covariate is another. - Branch (e): `dose_min < 0` (negative treated doses). `ContinuousDiD` does not apply AND HAD is **not** a fallback either — HAD raises on negative post-period dose (`had.py:1450-1459`). Linear DiD with a signed continuous covariate is the applicable alternative on this branch. Updated wording across: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (refactored from item-by-item duplication into a numbered list with a single "Routing alternatives when (1) or (5) fails" section that splits the two branches; trimmed redundancy). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (split the When-(1)-or-(5)-fails paragraph into the two branches). - `diff_diff/guides/llms-autonomous.txt` §4.7 trailing paragraph (consolidated to a pointer at §2's split discussion). - `diff_diff/guides/llms-autonomous.txt` §5.2 reasoning chain counter-example #4 (no never-treated branch: HAD applies) and counter-example #5 (negative-dose branch: HAD does NOT apply, cite `had.py:1450-1459`). - `CHANGELOG.md` Wave 2 entry. - `ROADMAP.md` AI-Agent Track building block. - `tests/test_profile_panel.py` two test docstrings/comments. Added `test_autonomous_negative_dose_path_does_not_route_to_had` in `tests/test_guides.py` asserting that §5.2 explicitly cites `had.py:1450-1459` on the negative-dose branch (used a single- line fingerprint since the prose phrase "non-negative dose support" is split across newlines in the rendered guide). Length housekeeping: trimmed counter-example #4 and #5 prose + §4.7 trailing paragraph to point at §2's split discussion; autonomous (65374 chars) < full (66031), `test_full_is_largest` green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.